AITopics | wide and deep neural network

Collaborating Authors

wide and deep neural network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Critical Initialization of Wide and Deep Neural Networks using Partial Jacobians: General Theory and Applications

Neural Information Processing SystemsDec-26-2025, 05:28:17 GMT

Deep neural networks are notorious for defying theoretical treatment. However, when the number of parameters in each layer tends to infinity, the network function is a Gaussian process (GP) and quantitatively predictive description is possible. Gaussian approximation allows one to formulate criteria for selecting hyperparameters, such as variances of weights and biases, as well as the learning rate. These criteria rely on the notion of criticality defined for deep neural networks. In this work we describe a new practical way to diagnose criticality.

critical initialization, partial jacobian, wide and deep neural network, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Neural Information Processing SystemsDec-25-2025, 23:57:26 GMT

We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a \textit{neural tangent random feature} (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of $\tilde{\mathcal{O}}(n^{-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.

generalization bound, stochastic gradient descent, wide and deep neural network, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)

Add feedback

Reviews: Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Neural Information Processing SystemsJan-27-2025, 04:05:54 GMT

Originality: To the best of my knowledge, the results are novel and provide important extensions/improvements over the previous art. Quality: I did a high level check of the proofs and it seems sound to me. Clarity: the paper is a joy to read. The problem definition, assumptions, the algorithm, and statement of results are very well presented. Significance: the results provide several extensions and improvements over the previous work, including training deeper models, training all layers, training with SGD (rather than GD), and smaller required overparameterization.

generalization bound, stochastic gradient descent, wide and deep neural network

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

Reviews: Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Neural Information Processing SystemsJan-27-2025, 04:05:44 GMT

This paper provides a generalization bound for training over-parameterized deep neural networks with ReLU activation and cross-entropy loss using SGD. Initially the paper received mixed reviews, with two positive and one negative reviews. On the one hand, the analysis is found to be intuitive, general, and potentially influential, the generalization bound is found to be more general and sharper than many existing generalization error bounds for over-parameterized neural networks, and the paper to be very well written. On the other, hand the width requirement is found to be too strict. The rebuttal addressed the issues raised by the reviewers, one rating was increased from 6 to 8, and the negative review updated the score to 6. Upon discussion, the reviewers agreed that the paper should be accepted.

generalization bound, stochastic gradient descent, wide and deep neural network, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

Add feedback

Critical Initialization of Wide and Deep Neural Networks using Partial Jacobians: General Theory and Applications

Neural Information Processing SystemsJan-19-2025, 10:28:20 GMT

critical initialization, general theory and application, wide and deep neural network, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Neural Information Processing SystemsOct-10-2024, 23:15:34 GMT

We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected 0 - 1 loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a \textit{neural tangent random feature} (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of \tilde{\mathcal{O}}(n {-1/2}) that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.

generalization error, stochastic gradient descent, wide and deep neural network, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm

Doshi, Darshil, He, Tianyu, Gromov, Andrey

arXiv.org Machine LearningNov-30-2021

Deep neural networks are notorious for defying theoretical treatment. However, when the number of parameters in each layer tends to infinity the network function is a Gaussian process (GP) and quantitatively predictive description is possible. Gaussian approximation allows to formulate criteria for selecting hyperparameters, such as variances of weights and biases, as well as the learning rate. These criteria rely on the notion of criticality defined for deep neural networks. In this work we describe a new way to diagnose (both theoretically and empirically) this criticality. To that end, we introduce partial Jacobians of a network, defined as derivatives of preactivations in layer $l$ with respect to preactivations in layer $l_0

jacobian, layernorm, wide and deep neural network, (11 more...)

arXiv.org Machine Learning

2111.12143

Country:

North America > United States > Rhode Island (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Cao, Yuan, Gu, Quanquan

Neural Information Processing SystemsMar-19-2020, 01:02:26 GMT

We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a \textit{neural tangent random feature} (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of $\tilde{\mathcal{O}}(n {-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.

generalization error, stochastic gradient descent, wide and deep neural network, (2 more...)

Neural Information Processing Systems

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Wide and Deep Neural Network for Survival Analysis from Anatomical Shape and Tabular Clinical Data

Pölsterl, Sebastian, Sarasua, Ignacio, Gutiérrez-Becker, Benjamín, Wachinger, Christian

arXiv.org Machine LearningSep-9-2019

We introduce a wide and deep neural network for prediction of progression from patients with mild cognitive impairment to Alzheimer's disease. Information from anatomical shape and tabular clinical data (demographics, biomarkers) are fused in a single neural network. The network is invariant to shape transformations and avoids the need to identify point correspondences between shapes. To account for right censored time-to-event data, i.e., when it is only known that a patient did not develop Alzheimer's disease up to a particular time point, we employ a loss commonly used in survival analysis. Our network is trained end-to-end to combine information from a patient's hippocampus shape and clinical biomarkers. Our experiments on data from the Alzheimer's Disease Neuroimaging Initiative demonstrate that our proposed model is able to learn a shape descriptor that augments clinical biomarkers and outperforms a deep neural network on shape alone and a linear model on common clinical biomarkers.

alzheimer, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

1909.0389

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback